-
Notifications
You must be signed in to change notification settings - Fork 335
Rewrite manifests #1661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Rewrite manifests #1661
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the many comments, most of them are easy to resolve. Thanks @amitgilad3 for working on this, I like it how you reuse the _ManifestMergeManager
👍
pyiceberg/manifest.py
Outdated
@@ -861,6 +865,7 @@ def existing(self, entry: ManifestEntry) -> ManifestWriter: | |||
entry.snapshot_id, entry.sequence_number, entry.file_sequence_number, entry.data_file | |||
) | |||
) | |||
self._existing_files += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we already do this here:
iceberg-python/pyiceberg/manifest.py
Line 823 in 7648803
self._existing_files += 1 |
pyiceberg/table/update/snapshot.py
Outdated
rewritten_manifests: List[ManifestFile] | ||
added_manifests: List[ManifestFile] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about slapping on some default factories: https://docs.python.org/3/library/dataclasses.html#dataclasses.field
rewritten_manifests: List[ManifestFile] | |
added_manifests: List[ManifestFile] | |
rewritten_manifests: List[ManifestFile] = field(default_factory=list) | |
added_manifests: List[ManifestFile] = field(default_factory=list) |
pyiceberg/table/update/snapshot.py
Outdated
@@ -477,6 +478,20 @@ def _deleted_entries(self) -> List[ManifestEntry]: | |||
return [] | |||
|
|||
|
|||
@dataclass(init=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of making this Frozen
? This gives some nice benefits like being hashable: https://docs.python.org/3/library/dataclasses.html#frozen-instances
Using the default_factory'
s, we can also drop the init.
@dataclass(init=False) | |
@dataclass(frozen=True) |
pyiceberg/table/update/snapshot.py
Outdated
_table: Table | ||
_spec: PartitionSpec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these are used.
pyiceberg/table/update/snapshot.py
Outdated
def _deleted_entries(self) -> List[ManifestEntry]: | ||
"""To determine if we need to record any deleted manifest entries. | ||
|
||
In case of an append, nothing is deleted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy paste :)
pyiceberg/table/update/snapshot.py
Outdated
self.rewritten_manifests.extend(deletes_result.rewritten_manifests) | ||
self.added_manifests.extend(deletes_result.added_manifests) | ||
|
||
if not self.rewritten_manifests: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, just for clarity that it is a list:
if not self.rewritten_manifests: | |
if len(self.rewritten_manifests) == 0: |
pyiceberg/table/update/snapshot.py
Outdated
min_count_to_merge=2, | ||
merge_enabled=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in Java we set the properties from the table properties:
assert len(result.rewritten_manifests) == 4, "Action should rewrite 4 manifests" | ||
assert len(result.added_manifests) == 2, "Action should add 2 manifests" | ||
|
||
new_manifests = tbl.current_snapshot().manifests(tbl.io) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing wrong with this, but tbl.inspect
might also be helpful to assert the manifests.
@amitgilad3 gentle ping, are you still interested in working on this? |
Hey @Fokko , yes i am very interested in finishing this, must of missed this (sorry) , will look at this later today :) |
abbb01c
to
6913f09
Compare
Looks like the CI is sad 😞
|
ssc = SnapshotSummaryCollector() | ||
partition_summary_limit = int( | ||
self._transaction.table_metadata.properties.get( | ||
TableProperties.WRITE_PARTITION_SUMMARY_LIMIT, TableProperties.WRITE_PARTITION_SUMMARY_LIMIT_DEFAULT | ||
) | ||
) | ||
ssc.set_partition_summary_limit(partition_summary_limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ssc = SnapshotSummaryCollector() | |
partition_summary_limit = int( | |
self._transaction.table_metadata.properties.get( | |
TableProperties.WRITE_PARTITION_SUMMARY_LIMIT, TableProperties.WRITE_PARTITION_SUMMARY_LIMIT_DEFAULT | |
) | |
) | |
ssc.set_partition_summary_limit(partition_summary_limit) | |
ssc = SnapshotSummaryCollector(int( | |
self._transaction.table_metadata.properties.get( | |
TableProperties.WRITE_PARTITION_SUMMARY_LIMIT, TableProperties.WRITE_PARTITION_SUMMARY_LIMIT_DEFAULT | |
) | |
)) |
Hey @Fokko - just wanted to say thanks for reviewing (really appreciate it) . |
@Fokko gentle ping, was wondering if we still have any blockers ? |
…ntation `LegacyOAuth2AuthManager` (apache#1981) <!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> <!-- Closes apache#1909 --> # Rationale for this change Replace existing Auth handling with `LegacyOAuth2AuthManager`. Tracking issue: apache#1909 There will be follow up PRs to this PR that will address the following: - introduce a mechanism for using a custom `AuthManager` implementation, along with the ability to use a set of config parameters - introduce a `OAuth2AuthManager` that more closely follows the OAuth2 protocol, and also uses a separate thread to proactively refreshes the token, rather than reactively refreshing the token on `UnAuthorizedError` or the deprecated `AuthorizationExpiredError`. # Are these changes tested? Yes, both through unit and integration tests # Are there any user-facing changes? Yes - previously, if `TOKEN` and `CREDENTIAL` are both defined, `oauth/tokens` endpoint wouldn't be used to refresh the token with client credentials when the `RestCatalog` was initialized. However, `oauth/tokens` endpoint would be used on retries that handled 401 or 419 error. This erratic behavior will now be updated as follows: - if `CREDENTIAL` is defined, `oauth/tokens` endpoint will be used to fetch the access token using the client credentials both when the RestCatalog is initialized, and when the refresh_tokens call is made as a reaction to 401 or 419 error. - if both `CREDENTIAL` and `TOKEN` are defined, we will follow the above behavior. - if only `TOKEN` is defined, the initial token will be used instead <!-- In the case of user-facing changes, please add the changelog label. -->
# Rationale for this change Add support for the Hugging Face filesystem in `fsspec`, which uses `hf://` paths. This allows to import [HF datasets](https://huggingface.co/datasets). Authentication is done using the `"hf.token"` property. # Are these changes tested? I tried locally but haven't added tests in test_fsspec.py (lmk if it's a requirement) # Are there any user-facing changes? No changes, it simply adds support for `hf://` URLs
<!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> Closes apache#2015 # Rationale for this change Enable the input of the ADLFS option `account_host` via the properties option `adls.account-host` # Are these changes tested? Yes # Are there any user-facing changes? The option is now available :)
<!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> <!-- Closes #${GITHUB_ISSUE_ID} --> Closes apache#2124 # Rationale for this change Refactor `Makefile` * Grouped commands and added comments * Added `COVERAGE` param to run test with coverage * Added `COVERAGE_FAIL_UNDER` param to specify coverage threshold to pass * Change test coverage threshold to `85` for unit tests, we are currently at `87` * Change test coverage threshold to `75` for integration tests, we are currently at `77` CI * Add s3/adls/gcs integration tests to run in CI * Run tests with coverage report Note the `gcsfs` issue was resolved by apache#2127 # Are these changes tested? Yes # Are there any user-facing changes? No <!-- In the case of user-facing changes, please add the changelog label. -->
Should help with apache#2130 and apache#2132 Modifies `Table.add_files` to explicitly use `inspect.data_files` and also parallelize `inspect._files` I didn't see anywhere else where looping over manifest entries was parallelized, so seems better to parallelize across manifests than within. No changes here but should be faster. --------- Co-authored-by: Kevin Liu <[email protected]>
<!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> <!-- Closes #${GITHUB_ISSUE_ID} --> # Rationale for this change Moar test coverage :) # Are these changes tested? # Are there any user-facing changes? <!-- In the case of user-facing changes, please add the changelog label. -->
# Rationale for this change Missing parameter in REST Catalog documentation # Are these changes tested? Doc only # Are there any user-facing changes? Doc only --------- Co-authored-by: Fokko Driesprong <[email protected]> Co-authored-by: Kevin Liu <[email protected]>
Bumps [mypy-boto3-glue](https://github.com/youtype/mypy_boto3_builder) from 1.38.22 to 1.38.42. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/youtype/mypy_boto3_builder/releases">mypy-boto3-glue's releases</a>.</em></p> <blockquote> <h2>8.8.0 - Python 3.8 runtime is back</h2> <h3>Changed</h3> <ul> <li><code>[services]</code> <code>install_requires</code> section is calculated based on dependencies in use, so <code>typing-extensions</code> version is set properly</li> <li><code>[all]</code> Replaced <code>typing</code> imports with <code>collections.abc</code> with a fallback to <code>typing</code> for Python <3.9</li> <li><code>[all]</code> Added aliases for <code>builtins.list</code>, <code>builtins.set</code>, <code>builtins.dict</code>, and <code>builtins.type</code>, so Python 3.8 runtime should work as expected again (reported by <a href="https://github.com/YHallouard"><code>@YHallouard</code></a> in <a href="https://redirect.github.com/youtype/mypy_boto3_builder/issues/340">#340</a> and <a href="https://github.com/Omri-Ben-Yair"><code>@Omri-Ben-Yair</code></a> in <a href="https://redirect.github.com/youtype/mypy_boto3_builder/issues/336">#336</a>)</li> <li><code>[all]</code> Unions use the same type annotations as the rest of the structures due to proper fallbacks</li> </ul> <h3>Fixed</h3> <ul> <li><code>[services]</code> Universal input/output shapes were not replaced properly in service subresources</li> <li><code>[docs]</code> Simplified doc links rendering for services</li> <li><code>[services]</code> Cleaned up unnecessary imports in <code>client.pyi</code></li> <li><code>[builder]</code> Import records with fallback are always rendered</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li>See full diff in <a href="https://github.com/youtype/mypy_boto3_builder/commits">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
<!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> <!-- Closes #${GITHUB_ISSUE_ID} --> # Rationale for this change Follow up to apache#2137 Remove all mentions of `CREATE OR REPLACE` issue, apache/iceberg#8756 # Are these changes tested? # Are there any user-facing changes? <!-- In the case of user-facing changes, please add the changelog label. -->
<!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> <!-- Closes #${GITHUB_ISSUE_ID} --> # Rationale for this change `make test` fails locally since `Config().get_known_catalogs()` also reads my local `~/.pyiceberg.yaml` Follow up to apache#2088 # Are these changes tested? # Are there any user-facing changes? <!-- In the case of user-facing changes, please add the changelog label. -->
<!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> <!-- Closes #${GITHUB_ISSUE_ID} --> Closes apache#1929 # Rationale for this change - Since we want to support snapshot write compatibility (apache#1772) and is part of the following parent issue apache#819 # Are these changes tested? Yes # Are there any user-facing changes? No <!-- In the case of user-facing changes, please add the changelog label. --> --------- Co-authored-by: Jayce Slesar <[email protected]> Co-authored-by: Fokko Driesprong <[email protected]>
…pache#2141) <!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> <!-- Closes apache#2032 --> # Rationale for this change Added new configuration parameter hive.kerberos-service-name (apache#2032) hive.kerberos-service-name Defaults to "hive" # Are these changes tested? added unit test. # Are there any user-facing changes? this change adds an optional configuration parameter for the hive catalog (hive.kerberos-service-name) which defaults to "hive". the change includes doc updates. <!-- In the case of user-facing changes, please add the changelog label. --> Co-authored-by: Colm Dougan <[email protected]>
fixing apache#2122 <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> <!-- Closes #${GITHUB_ISSUE_ID} --> # Rationale for this change # Are these changes tested? Yes tested locally # Are there any user-facing changes? Nope, just use them --------- Co-authored-by: Kevin Liu <[email protected]> Co-authored-by: Fokko Driesprong <[email protected]>
Bumps [mypy-boto3-dynamodb](https://github.com/youtype/mypy_boto3_builder) from 1.38.4 to 1.39.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/youtype/mypy_boto3_builder/releases">mypy-boto3-dynamodb's releases</a>.</em></p> <blockquote> <h2>8.8.0 - Python 3.8 runtime is back</h2> <h3>Changed</h3> <ul> <li><code>[services]</code> <code>install_requires</code> section is calculated based on dependencies in use, so <code>typing-extensions</code> version is set properly</li> <li><code>[all]</code> Replaced <code>typing</code> imports with <code>collections.abc</code> with a fallback to <code>typing</code> for Python <3.9</li> <li><code>[all]</code> Added aliases for <code>builtins.list</code>, <code>builtins.set</code>, <code>builtins.dict</code>, and <code>builtins.type</code>, so Python 3.8 runtime should work as expected again (reported by <a href="https://github.com/YHallouard"><code>@YHallouard</code></a> in <a href="https://redirect.github.com/youtype/mypy_boto3_builder/issues/340">#340</a> and <a href="https://github.com/Omri-Ben-Yair"><code>@Omri-Ben-Yair</code></a> in <a href="https://redirect.github.com/youtype/mypy_boto3_builder/issues/336">#336</a>)</li> <li><code>[all]</code> Unions use the same type annotations as the rest of the structures due to proper fallbacks</li> </ul> <h3>Fixed</h3> <ul> <li><code>[services]</code> Universal input/output shapes were not replaced properly in service subresources</li> <li><code>[docs]</code> Simplified doc links rendering for services</li> <li><code>[services]</code> Cleaned up unnecessary imports in <code>client.pyi</code></li> <li><code>[builder]</code> Import records with fallback are always rendered</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li>See full diff in <a href="https://github.com/youtype/mypy_boto3_builder/commits">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [mypy-boto3-glue](https://github.com/youtype/mypy_boto3_builder) from 1.38.42 to 1.39.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/youtype/mypy_boto3_builder/releases">mypy-boto3-glue's releases</a>.</em></p> <blockquote> <h2>8.8.0 - Python 3.8 runtime is back</h2> <h3>Changed</h3> <ul> <li><code>[services]</code> <code>install_requires</code> section is calculated based on dependencies in use, so <code>typing-extensions</code> version is set properly</li> <li><code>[all]</code> Replaced <code>typing</code> imports with <code>collections.abc</code> with a fallback to <code>typing</code> for Python <3.9</li> <li><code>[all]</code> Added aliases for <code>builtins.list</code>, <code>builtins.set</code>, <code>builtins.dict</code>, and <code>builtins.type</code>, so Python 3.8 runtime should work as expected again (reported by <a href="https://github.com/YHallouard"><code>@YHallouard</code></a> in <a href="https://redirect.github.com/youtype/mypy_boto3_builder/issues/340">#340</a> and <a href="https://github.com/Omri-Ben-Yair"><code>@Omri-Ben-Yair</code></a> in <a href="https://redirect.github.com/youtype/mypy_boto3_builder/issues/336">#336</a>)</li> <li><code>[all]</code> Unions use the same type annotations as the rest of the structures due to proper fallbacks</li> </ul> <h3>Fixed</h3> <ul> <li><code>[services]</code> Universal input/output shapes were not replaced properly in service subresources</li> <li><code>[docs]</code> Simplified doc links rendering for services</li> <li><code>[services]</code> Cleaned up unnecessary imports in <code>client.pyi</code></li> <li><code>[builder]</code> Import records with fallback are always rendered</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li>See full diff in <a href="https://github.com/youtype/mypy_boto3_builder/commits">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [huggingface-hub](https://github.com/huggingface/huggingface_hub) from 0.33.0 to 0.33.1. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/huggingface/huggingface_hub/releases">huggingface-hub's releases</a>.</em></p> <blockquote> <h2>[v0.33.1]: Inference Providers Bug Fixes, Tiny-Agents Message handling Improvement, and Inference Endpoints Health Check Update</h2> <p>Full Changelog: <a href="https://github.com/huggingface/huggingface_hub/compare/v0.33.0...v0.33.1">https://github.com/huggingface/huggingface_hub/compare/v0.33.0...v0.33.1</a></p> <p>This release introduces bug fixes for chat completion type compatibility and feature extraction parameters, enhanced message handling in tiny-agents, and updated inference endpoint health check:</p> <ul> <li>[Tiny agents] Add tool call to messages <a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3159">#3159</a> by <a href="https://github.com/NielsRogge"><code>@NielsRogge</code></a></li> <li>fix: update payload preparation to merge parameters into the output dictionary <a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3160">#3160</a> by <a href="https://github.com/mishig25"><code>@mishig25</code></a></li> <li>fix(inference_endpoints): use GET healthRoute instead of GET / to check status <a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3165">#3165</a> by <a href="https://github.com/mfuntowicz"><code>@mfuntowicz</code></a></li> <li>Recursive filter_none in Inference Providers <a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3178">#3178</a> by <a href="https://github.com/Wauplin"><code>@Wauplin</code></a></li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/huggingface/huggingface_hub/commit/c8ffea9ca815b10edf07085443d524bac479153d"><code>c8ffea9</code></a> Release: v0.33.1</li> <li><a href="https://github.com/huggingface/huggingface_hub/commit/e76fc3acfa3864803dc78b7f76b473fb81173bc8"><code>e76fc3a</code></a> [Tiny agents] Add tool call to messages (<a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3159">#3159</a>)</li> <li><a href="https://github.com/huggingface/huggingface_hub/commit/57eacd2a246e476db9b445ced4ad6796b7c82fa3"><code>57eacd2</code></a> fix(inference_endpoints): use GET <code>healthRoute</code> instead of GET / to check sta...</li> <li><a href="https://github.com/huggingface/huggingface_hub/commit/24a0c75e186cc34c4208a84e5faad3063320b090"><code>24a0c75</code></a> Recursive filter_none in Inference Providers (<a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3178">#3178</a>)</li> <li><a href="https://github.com/huggingface/huggingface_hub/commit/4ada292c260e8d2e234cb2743e0a35706f29f4ae"><code>4ada292</code></a> fix: update payload preparation to merge parameters into the output dictionar...</li> <li>See full diff in <a href="https://github.com/huggingface/huggingface_hub/compare/v0.33.0...v0.33.1">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [pyroaring](https://github.com/Ezibenroc/PyRoaringBitMap) from 1.0.1 to 1.0.2. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/Ezibenroc/PyRoaringBitMap/releases">pyroaring's releases</a>.</em></p> <blockquote> <h2>1.0.2</h2> <h2>What's Changed</h2> <ul> <li>Restrict Cython version to <code><3.1.0</code> by <a href="https://github.com/Ezibenroc"><code>@Ezibenroc</code></a> in <a href="https://redirect.github.com/Ezibenroc/PyRoaringBitMap/pull/131">Ezibenroc/PyRoaringBitMap#131</a></li> <li>Fix a bug in the hash computation of <code>FrozenBitMap</code> and <code>FrozenBitMap64</code> by <a href="https://github.com/chdemko"><code>@chdemko</code></a> in <a href="https://redirect.github.com/Ezibenroc/PyRoaringBitMap/pull/130">Ezibenroc/PyRoaringBitMap#130</a></li> <li>Upgrade CRoaring from v4.1.1 to v4.2.3 by <a href="https://github.com/Ezibenroc"><code>@Ezibenroc</code></a> in <a href="https://redirect.github.com/Ezibenroc/PyRoaringBitMap/pull/133">Ezibenroc/PyRoaringBitMap#133</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/chdemko"><code>@chdemko</code></a> made their first contribution in <a href="https://redirect.github.com/Ezibenroc/PyRoaringBitMap/pull/130">Ezibenroc/PyRoaringBitMap#130</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/Ezibenroc/PyRoaringBitMap/compare/1.0.1...1.0.2">https://github.com/Ezibenroc/PyRoaringBitMap/compare/1.0.1...1.0.2</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/Ezibenroc/PyRoaringBitMap/commit/369daeb212ad69b4cf68e05b7c632fd2b9473e49"><code>369daeb</code></a> Version 1.0.2</li> <li><a href="https://github.com/Ezibenroc/PyRoaringBitMap/commit/2dce0191124464ccff30dbab8ce72c4c841a7d3c"><code>2dce019</code></a> Upgrade CRoaring to v4.2.3 (<a href="https://redirect.github.com/Ezibenroc/PyRoaringBitMap/issues/133">#133</a>)</li> <li><a href="https://github.com/Ezibenroc/PyRoaringBitMap/commit/52dfaef065ef9946240b001eb409acce7e8c4b71"><code>52dfaef</code></a> Add test_hash_eq_after_operations (<a href="https://redirect.github.com/Ezibenroc/PyRoaringBitMap/issues/132">#132</a>)</li> <li><a href="https://github.com/Ezibenroc/PyRoaringBitMap/commit/c12f08d71b603aa95d259e194bffb8e323d851dd"><code>c12f08d</code></a> Merge branch 'master' into contributor-feature</li> <li><a href="https://github.com/Ezibenroc/PyRoaringBitMap/commit/f4e43c12c4ba25868a0ea34838d96d137ff26327"><code>f4e43c1</code></a> Restrict Cython version (<a href="https://redirect.github.com/Ezibenroc/PyRoaringBitMap/issues/131">#131</a>)</li> <li><a href="https://github.com/Ezibenroc/PyRoaringBitMap/commit/a1bb2ff6b4f281c6b19dd4c223b1e0966872e287"><code>a1bb2ff</code></a> Fix class AbstractBitMap64</li> <li><a href="https://github.com/Ezibenroc/PyRoaringBitMap/commit/eafa2dac12738c009b60f4db01f3c935a35d2d16"><code>eafa2da</code></a> Fix <a href="https://redirect.github.com/Ezibenroc/PyRoaringBitMap/issues/129">#129</a></li> <li>See full diff in <a href="https://github.com/Ezibenroc/PyRoaringBitMap/compare/1.0.1...1.0.2">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Fixes: apache#306 --------- Co-authored-by: Kevin Liu <[email protected]>
…he#2036) <!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> <!-- Closes #${GITHUB_ISSUE_ID} --> # Rationale for this change Like iceberg jar, we should also update hive storage descriptor after commit metadata see: https://github.com/apache/iceberg/blob/b504f9c51c6c0e0a5c0c5ff53f295e69b67d8e59/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java#L170 # Are these changes tested? new UTs # Are there any user-facing changes? No <!-- In the case of user-facing changes, please add the changelog label. -->
<!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> <!-- Closes #${GITHUB_ISSUE_ID} --> # Rationale for this change See apache#2013 Closes apache#2064 Continuing the trend, but with glue. # Are these changes tested? See test below # Are there any user-facing changes? When a user specifies property update on commit table, those parameters will be passed to the glue client. <!-- In the case of user-facing changes, please add the changelog label. -->
# Rationale for this change scan's `row_filter` param is not super intuitive. I got tired of reading over the expression and parser code as I'm trying to build out statements, so I had some docs made up. # Are these changes tested? They are docs only, so not really? # Are there any user-facing changes? Yes there are docs for the expression and string syntaxes of `row_filter` now. <!-- In the case of user-facing changes, please add the changelog label. --> --------- Co-authored-by: Fokko Driesprong <[email protected]>
<!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> <!-- Closes #${GITHUB_ISSUE_ID} --> # Rationale for this change - Added pyiceberg table integration so that pyiceberg `Table` can be pass in directly to datafusion's `register_table_provider` - Added `datafusion` as a optional dependency - Added docs for the integration: <img width="1279" alt="Screenshot 2025-07-06 at 10 59 44 AM" src="https://github.com/user-attachments/assets/f41f08e6-dd41-4012-ad96-2eaae805d28e" /> # Are these changes tested? Yes # Are there any user-facing changes? <!-- In the case of user-facing changes, please add the changelog label. -->
<!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> <!-- Closes #${GITHUB_ISSUE_ID} --> # Rationale for this change # Are these changes tested? # Are there any user-facing changes? <!-- In the case of user-facing changes, please add the changelog label. -->
I noticed that the docs needed some TLC. - Collapsed some lines to make it more compact. - Avoid imports where possible (eg transforms). - Update docs. - Add an example of the `to_arrow_batch_reader` earlier in the docs. <!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> <!-- Closes #${GITHUB_ISSUE_ID} --> # Rationale for this change # Are these changes tested? # Are there any user-facing changes? <!-- In the case of user-facing changes, please add the changelog label. --> --------- Co-authored-by: Jayce Slesar <[email protected]>
apache#2153) <!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> <!-- Closes #${GITHUB_ISSUE_ID} --> # Rationale for this change Hello! Setting or removing table properties on console currently raise a `Writing is WIP` error. This PR adds the code to set and remove table properties. # Are these changes tested? Yes # Are there any user-facing changes? Yes, setting and removing table properties on console now works. <!-- In the case of user-facing changes, please add the changelog label. -->
2. write tests for rewrite manifests
This is an initial implementation of rewrite manifests, aiming to mimic the Java implementation as closely as possible. I’ve tried to follow the same structure and logic, but there are still some areas that might need refinement.
I’m looking for feedback and suggestions on:
• Whether the approach aligns well with the existing design.
• Any gaps or optimizations that could improve performance.
• How best to proceed with completing this feature.
Would love any insights or guidance on the next steps! Thanks in advance for the review! 🙌